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ABSTRACT 

Background Musical abilities such as recognising 
music and singing performance serve as means for 
communication and are instruments in sexual selection. 
Specific regions of the brain have been found to be 
activated by musical stimuli, but these have rarely been 
extended to the discovery of genes and molecules 
associated with musical ability. 
Methods A total of 1008 individuals from 73 families 
were enrolled and a pitch-production accuracy test was 
applied to determine musical ability. To identify genetic 
loci and variants that contribute to musical ability, we 
conducted family-based linkage and association analyses, 
and incorporated the results with data from exome 
sequencing and array comparative genomic hybridisation 
analyses. 

Results We found significant evidence of linkage at 
4q23 with the nearest marker D4S2986 (L0D=3.1), 
whose supporting interval overlaps a previous study in 
Finnish families, and identified an intergenic single 
nucleotide polymorphism (SNP) (rs1 251 078, 
p=8.4x10~ 17 ) near UGT8, a gene highly expressed in 
the central nervous system and known to act in brain 
organisation. In addition, a non-synonymous SNP in UGT8 
was revealed to be highly associated with musical ability 
(rs4148254, p=8.0x10" 17 ), and a 6.2 kb copy number 
loss near UGT8 showed a plausible association with 
musical ability (p=2.9x10~ 6 ). 
Conclusions This study provides new insight into the 
genetics of musical ability, exemplifying a methodology 
to assign functional significance to synonymous and non- 
coding alleles by integrating multiple experimental 
methods. 



INTRODUCTION 

Song as a communication signal and as an instru- 
ment in sexual selection has been recognised since 
it was first proposed by Darwin. 1 " 3 Musical ability 
is a non-verbal and complex cognitive skill 7 and 
appears to have a latent biological basis in that 
infants can differentiate frequencies and 'carry a 
tune 7 without receiving extensive formal musical 
training. 

Researchers have described certain aspects of 
how the architecture of the brain affects facets 
of musical ability. Perception and vocal production 
of singing seem to be based on the auditory and 



motor domains of the brain. 4 5 Studies of impaired 
language skills with spared musical abilities and 
impaired musical abilities with normal language 
skills have revealed a dissociation between these 
two skill sets 7 6 leading to the proposal of a distinct 
mental module associated with separate neural 
substrates and a set of neurally isolatable process- 
ing components. A minority of humans exhibit 
extreme musical abilities in the form of either 
absolute pitch (the ability to accurately label tones 
with specific musical notes) or amusia (the inabil- 
ity to accurately identify and mimic tones). 7 8 

Recent studies have identified genetic compo- 
nents of musical ability For example 7 absolute 
pitch has a significant familial basis and is predom- 
inant in females. 9 A twin study has shown 
substantial heritability for musical ability 10 and 
linkage studies have found loci for musical apti- 
tude and absolute pitch. 11 12 Some polymorphisms 
of specific genes in association with musical ability 
have begun to be reported 7 including variants of 
AVPR1A and SLC6A4. 13 14 

As part of the GENDISCAN study (GENe 
DIScovery for Complex traits in large isolated fam- 
ilies of Asians of the Northeast) 7 which was 
designed to investigate genetic influences on 
complex traits in extended Asian families of rural 
Mongolia, we investigated the processing of pitch 
using 1008 subjects from 73 families. It was 
expected that several points of the GENDISCAN 
study would increase the power of genetic loci dis- 
covery in normal complex traits 7 considering (1) the 
study population has little ethnic admixture, 
(2) consists of large extended families, and (3) repre- 
sents a community-based population unbiased by 
health status. 15 

To overcome the difficulties of identifying 
genetic variations underlying common complex 
diseases, an approach that allows for recruitment 
of homogeneous and isolated populations was pro- 
posed. However, only a few studies have incorpo- 
rated this approach due to difficulties in sample 
recruitments. The inner Mongolian steppes are 
still inhabited by small populations; geographically 
isolated populations are commonly found in rural 
provinces of Mongolia. We recruited Mongolian 
individuals from an isolated population with large 
extended pedigrees. These individuals possess a 
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homogeneous genetic background and close genetic affinity to 
populations of the northern part of East Asia. 16-19 

Previously; binary familiarity tests have mostly been used to 
indicate whether or not each song part sounds similar to assess 
musical ability. 10 20-22 By shifting the pitch of melody one 
semitone higher or lower 7 participants were asked to classify 
two melodies as the same or different. In this study we created 
a test to analyse subjects 7 acoustic outputs followed by hearing 
specific tones using cochlear implants (CI). 23 24 There are 
advantages to this approach, which include the possibility to 
study musical ability as a whole and the better availability of 
subjects. We determined the pitch discrimination limen with a 
simulated CI coding strategy and employed the complementary 
nature of linkage- and association-based methods for musical 
ability. The functional importance of results was screened 
through the incorporation of data from exome sequencing and 
array-based comparative genomic hybridisation (aCGH). This 
combined approach provides a method by which to discover 
additional novel genetic loci underlying complex traits. 

METHODS 

Study subjects and phenotype measurement 

In 2006 7 a total of 2008 volunteers were recruited in 
Dashbalbar 7 Dornod Province 7 Mongolia for the GENDISCAN 
project, 25-28 which was designed to discover the genetic back- 
grounds of several complex traits (figure 1). For this project 7 we 
selected an isolated population composed of large extended 
families. This population is highly appropriate for gene 
mapping research due to its genetic homogeneity decreased 
environmental heterogeneity and restricted geographical distri- 
bution. 29 Extended multi-generation families comprising a 
small number of founders are known to increase the genetic 



power. Traits included in this project are summarised in 
online supplementary table SI. 

In this study we chose 1008 individuals who are derived 
from 73 extended families and have precise pedigree structures. 
Table 1 lists descriptive characteristics of the study population. 
The average age of the participants is 31.0 years and 51.6% are 
women. The family structure in this population is very compli- 
cated 7 with multiple generations and many family pairs such as 
1794 parent-offspring pairs 7 734 full-siblings 7 395 half-siblings 7 
and 888 avuncular pairs. The average family size and standard 
deviation are 19.6 and 11. 3 7 respectively. Peripheral blood 
sample was collected for each study subject 7 and DNA was 
extracted according to standard protocols. The extracted DNA 
was stored in solution at -20°C. 

To examine the musical ability of subject s 7 we used a pitch- 
production accuracy (PPA) test based on the difference limen of 
a pitch paradigm in a psychophysical experiment with a simu- 
lated CI coding strategy. 31 PPA is given by (100-10x(|vi-v s |/ 
v s x 100)) 7 subtracting 10 points for each 1% error 7 where v s is the 
standard auditory frequency emitted by a pitch-producing 
device and Vj is the vocal pitch frequency produced by the indivi- 
dual who hear a specific tone through a headset and recite the 
sound. 32 A harmonic tone complex with a sound pressure level 
of 70 dB intensity and sex-dependent fundamental frequency 
was used as a stimulus (see online supplementary table S2). 

The participants with PPA values higher than 60 were cate- 
gorised as individuals with good musical ability because they 
were consistently and accurately able to produce tones differing 
by less than a semitone from one another; the number of subjects 
with a PPA score over 60 was 357 (35.4%). However 7 for further 
analyses 7 participants with borderline PPA values between 50 and 
70 were excluded to eliminate ambiguous PPA values; the number 
of subjects with PPA score over 70 was 268 (31.1%). 




Figure 1 Overview of the project for musical ability. The pitch-production accuracy test was used to measure musical ability of 1008 individuals 
from 73 extended families of an isolated Mongolian population. We started with a genome-wide linkage study to identify potential causal loci 
associated with musical ability, and subsequently conducted a family-based association test under the linkage peak on 4q23 (99-1 18 cM). 
Furthermore, we used exome sequencing data in 40 founders and assessed copy number variants in 30 founders to explore plausible candidates 
for causal variants of musical ability with additional validating experiments. CN, copy number; SNR single nucleotide polymorphism. 
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Table 1 Descriptive characteristics of study participants 

Characteristics Value 



Sample information 



No. of samples 


1008 


No. of females (%) 


520 (51.6) 


Mean (SD) age (in years) 


31.0(15.5) 


No. of sample with PPA score (%) 




>70 


268 (26.6) 


>60 


357 (35.4) 


<60 


651 (64.6) 


<50 


594 (58.9) 


Family information 




No. of families 


73 


Mean size (SD) of family members 


19.61 (11.3) 


No. of pairs 




Parent-offspring 


1794 


Full-sibling 


734 


Sister-sister 


198 


Brother-brother 


167 


Sister-brother 


369 


Half-sibling 


395 


Grandparent-grandchild 


1202 


Avuncular pairs 


888 


First cousins 


598 



PPA, pitch-production accuracy. 



Genome-wide linkage scan and family-based association study 
under linkage region 

We genotyped 862 samples from 70 families with deCODE 
1039 microsatellite marker platform throughout the autosomes 
for genome- wide linkage analysis. We checked family relation- 
ships through PREST 33 using an average identity-by-descent 
(IBD)-based method. PEDCHECK was used to examine 
Mendelian inconsistencies in genotype data 7 34 and non- 
Mendelian genotype errors were detected with SimWalk. 35 
After fixing the genotype errors 7 multipoint identity-by- 
descent-matrices were calculated at each 1 cM distance 7 and 
converted using the Markov chain-Monte Carlo method by 
LOKI. 36 We used the Kosambi mapping function (derived from 
the deCODE map) to convert map distances into recombin- 
ation fractions. For the multipoint linkage analyses 7 the 
Sequential Oligogenic Linkage Analysis Routines package was 
used. 37 We performed 10 000 permutation tests using the 
lodadj option to obtain the empirical p value. In addition 7 we 
estimated the adjusted narrow-sense heritability (h 2 ) (ie 7 the 
proportion of phenotype variance attributable to additive 
genetic variance). In all analyses 7 we used age and sex as 
covariates. 

For further association analysis 7 53 extended families com- 
posed of 630 family members were genotyped using an 
Illumina Human610-Quad BeadChip kit by Macrogen 
(Macrogen Inc 7 Seoul 7 Korea). We evaluated the Mendelian 
inconsistencies in single nucleotide polymorphism (SNP) data 
using PEDCHECK. 34 Non-Mendelian genotype errors were 
detected using Merlin. 38 SNP quality control assessment was 
based on SNP call rate 7 marker error rate 7 and minor allele fre- 
quency (MAF); minimum per-SNP call rate of 99% 7 less than 
1% marker error rate 7 and higher than 5% MAF. In addition 7 we 
also removed genotypes with Hardy-Weinberg equilibrium 
p values <lxl0 -6 We focused on the putative linkage region 
in chromosome 4 for this analysis (1-LOD Unit Support 
Interval: 99-118 cM). A total of 3424 SNPs that met quality 
control criteria were included in the putative linkage region 7 
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and the PBAT tool in HelixTree software (V6.4; GoldenHelix) 
was used for family-based association test (FBAT) 7 which can 
control population stratification or population admixture. 15 39 
The null hypothesis was 'linkage and no association (sandwich 
variance) ; 7 40 which can be useful for expanded pedigrees by cal- 
culating a robust variance. We used the generalised estimating 
equation for the FBAT test statistic, and hypothesised an addi- 
tive model. The association result was adjusted by covariates of 
age and sex. 



Screening functional significance of candidates using exome 
sequencing and aCGH data integration 

To assign a functional significance to candidates, we used 
exome sequencing data of 40 founders and 180K aCGH results 
of 30 founders, both of which were included in this study and 
previously genotyped in our group. The experimental summary 
of each is described in data supplement (see online supplemen- 
tary tables S3— S5 7 supplementary methods). Among SNPs and 
short insertions/deletions (indels) called from exomes, we 
selected coding sequence SNPs and indels, and canonical splice- 
site variants as candidates, along with the copy number var- 
iants (CNVs) called from the aCGH experiment. Focusing on 
variants in the putative linkage region, we further narrowed 
our candidates by linkage disequilibrium (LD) estimation with 
the top 10 SNPs of our association study. Haploview software 
(V3.2) was used for this LD estimation. 

Among the candidates showing a significant level of LD, we 
selected one SNP and one CNV to be genotyped in our study 
population and compared their p values with the association 
results. For the SNP selected, three-dimensional (3D) modelling 
was conducted to predict its functional impact on the corre- 
sponding protein (see online supplementary methods). 



RESULTS 

Family-based linkage and association study 

The heritability explained by the additive genetic portion of 
musical ability was estimated as 40% (p<0.0001, 95% CI 
20.4% to 59.6%), and linkage regions with LOD>1.0 were 
found for musical ability from the genome-wide linkage scan 
(see online supplementary table S6). The maximum LOD score 
was 3.1 at chromosome 4q23 with the nearest marker D4S2986 
(figure 2A), and the putative linkage region encompassing a 
maximum 1-LOD unit supports an interval range from 99 cM 
to 118 cM (figure 2B). In the next phase, we conducted FBAT 
to identify candidate variants within the putative linkage inter- 
val. Table 2 shows the top 10 SNPs that were significantly asso- 
ciated with musical ability, and all of these have reached the 
strict genome-wide significance of p<lxl0 -8 . The strongest 
association (p=8.4xl0 -17 ) was found for rsl2510781, an inter- 
genic SNP near UGT8 (MIM 601291). The regional association 
plot near UGT8 is shown in figure 2C, and plotted recombin- 
ation rates reflecting local LD structure were estimated from 
HapMap data. Three other SNPs (rsl0024217, rsl903364, and 
rsl2504058) were in moderate LD with rsl2510781 (r 2 =0.4). 
A synonymous SNP within UGT8 (rs4148255) also showed sig- 
nificance in p value levels, despite the low LD with rs 125 10781 
(p=2.7xl0- 10 , r 2 <0.1). The SNP with the second highest 
significance (p=3.0x 10~ 13 ) was rs9307160 in the intron of 
UNC5C (MIM 603610), and the others were located near 
ALPK1 (MIM 607347) and ELOVL6 (MIM 611546). 
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Figure 2 Summary of genome-wide linkage and association results for musical ability. (A) Genome-wide linkage results for musical ability. 
(B) The linkage peak on chromosome 4 and association plot under the linkage support region. The linkage support interval is indicated by a green 
line (99-1 18 cM). The red dot is the top single nucleotide polymorphism (SNP) by family-based association test. The SNPs 2-10 are labelled with 
green dots. (C) Regional plot of association results for SNPs from analysis (-log 10 p) for UGT8 (±300 kb position from top SNP). The SNPs close to 
rs1 251 0781, the most significant SNP (blue diamond), are colour-coded to reflect their linkage disequilibrium with this SNP (r 2 <0.2; white, 
0.2<r 2 <0.4; yellow, 0.4<r 2 <0.8; orange, r 2 >0.8; red). 



Utilisation of exome sequencing and aCGH data to assign 
functional significance to candidate variants 

Among the candidates from the exome data (347 SNPs and 
seven indels in the putative linkage region) 7 we narrowed down 
to four SNPs that were in strong LD with the top 10 SNPs 
identified via FBAT (r 2 >0.6 7 online supplementary table ST). 
We found that a non- synonymous SNP (nsSNP) in UGT8 
(rs4148254) showed perfect LD with rsl2510781 7 the most sig- 
nificant SNP from FBAT (r 2 =1.0) 7 and this SNP was genotyped 



in 611 FBAT samples for the association analysis. As a result 7 
the LD between rs4148254 and rsl2510781 was re-estimated 
(r 2 =0.93) 7 and the rs4148254 SNP was found to have the most 
significant association with musical ability in this study 
(p=8.0xl(T 17 ). The effect estimate of this SNP in founder 
samples was also higher than that of rsl2510781 (OR=3.4 7 
95% CI 1.2 to 9.9 vs OR=3.0 7 95% CI 1.1 to 8.2 7 online supple- 
mentary tables S8 7 S9). The 3D modelling of UGT8 protein 
showed that Pro226 7 which is changed to leucine by the SNP 7 



Table 2 Top 10 SNPs significantly associated with musical ability by FBAT under the putative linkage region of chromosome 4 



SNP 


"Position 


Alleles 
Effect 


Other 


Frequency of 
effect allele 


p Value (FBAT) 


tNearest gene(s) 


Location (distance) 


rs12510781 


115 860 030 


G 


A 


0.12 


8.4X10 -17 


UGT8 


Intergenic (42.3 kb) 


rs9307160 


96 586 977 


C 


T 


0.10 


3.0x10~ 13 


UNC5C 


Intronic (~) 


rs1 7628408 


113 574 860 


G 


A 


0.91 


7.1 xlO -11 


ALPK1 


Intronic (~) 


rs2074385 


113 598 098 


C 


A 


0.91 


7.1 xlO" 11 


ALPK1 


Intergenic (14.8 kb) 


rs41 48255 


115 764 226 


A 


G 


0.88 


2.7x10~ 10 


UGT8 


Synonymous (~) 


rs1 1097397 


95 087 875 


G 


T 


0.28 


4.8X10- 10 




Intergenic (~) 


rs1 002421 7 


115 677 564 


C 


T 


0.28 


6.1 x10 -9 


UGT8 


Intergenic (61.4 kb) 


rs1 903364 


115 681 713 


C 


T 


0.28 


6.1 x10 -9 


UGT8 


Intergenic (57.3 kb) 


rs1 2504058 


115 718 566 


G 


A 


0.28 


6.1 x10~ 9 


UGT8 


Intergenic (20.4 kb) 


rs6845765 


111 177 613 


C 


T 


0.86 


8.2x10~ 9 


EL0VL6 


Intergenic (12.0 kb) 



"Positions are based on Build 36 from NCBI. 
tNearest gene, within ±100 kb of the SNR 

FBAT, family-based association test; SNR single nucleotide polymorphism. 
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might be part of the loop exposed outside of the predicted 3D 
structure 7 and the loop with the Pro226 residue contains 
sequence motifs including TRFH domain docking and 
USP7-binding motifs (see online supplementary figure SI). 

At the level of CNVs 7 only one copy number (CN) loss was 
found to have moderate LD with rs4148255 7 the fifth most sig- 
nificant SNP in FBAT (r 2 =0.48; online supplementary table 
S10). This CN loss (Chr4: 115 727 257-115 733 452) is located 
5.6 kb upstream of the UGT8 gene. We genotyped it in 618 
FBAT samples and the frequencies of heterozygous and homo- 
zygous CN losses were shown to be 45.15% and 10.03% in our 
study subjects (allele frequency =32.61%). This CNV was nega- 
tively associated with musical ability (p=2.9xl0 -6 ) and 7 inter- 
estingly a diploid status at this position was shown to 
potentiate the positive effect of rs4 148254 in founders (see 
online supplementary table Sll). In addition 7 we identified a 
significant interaction effect between this CNV and rs4 148254 
using a logistic regression model (p=0.01). 

DISCUSSION 

In this study we explored the genetic determinants of musical 
ability by combining several methodologies 7 namely family- 
based linkage and association studies supported by exome 
sequencing and aCGH data analyses. This study was conducted 
as a part of the GENDISCAN project, which was designed to 
discover the genetic backgrounds of complex traits in 
Mongolia. 

Musical ability is a well-known complex trait determined by 
multiple environmental and genetic factors. As this trait con- 
sists of several factors including perception, cognition, learning, 
and emotions, a variety of genes have an effect on one's 
musical ability, both independently and interactively. To dis- 
cover genetic backgrounds of these complex traits, studies 
should be designed from the first to increase the power to 
detect genetic loci. In this regard, our study has some strong 
points as described in the Introduction and Methods, which 
include little ethnic admixture and large extended families. In 
addition, we excluded samples with borderline phenotypes 
from all the analyses to derive more accurate results. 

Our results support the view that musical ability is heritable 
and have shown significant evidence of linkage for musical 
ability in large families. Previously, a linkage study for musical 
aptitude was performed with samples in a small number of 
Finnish multigenerational families, composed of predominantly 
white subjects. That study found an association of the chromo- 
somal region 4q22 with musical aptitude in the Finnish study 
population, 11 which overlaps with our linkage interval on 
chromosome 4q. Despite several differences in methodology, we 
believe that overlapping results for musical ability in different 
ethnic populations enhance the reliability of this linkage region 
on chromosome 4q. 

We also discovered common variants strongly associated 
with musical ability suggesting a biological mechanism for this 
finding. Including the most significant, five SNPs among the 
top 10 were shown to lie near or within UGT8. In addition, 
there was no LD structure between rsl2510781 and rs4148255. 
These two unrelated variants on one gene, associated with the 
same phenotype, increase the possibility of UGT8 being one of 
the true susceptibility genes for musical ability. 

To identify more detailed causal variants, we integrated add- 
itional technologies such as exome sequencing and aCGH, 
resulting in the discovery of another nsSNP in UGT8 and a CN 
loss located 5.6 kb upstream of this gene. The SNP rs4148254, 
which changes amino acid 226 of the UGT8 protein from 



proline to leucine, was not included in the platform we used, 
and has shown a lower p value than rsl2510781 in our study 
population (see online supplementary figure S1A,B). Because 
the BLOSUM score 41 for this change is '-3 7 , and PolyPhen-2 42 
predicts this to be damaging, the SNP might affect the function 
of the UGT8 protein. Moreover, this proline amino acid seems 
to be conserved among vertebrates (see online supplementary 
table S12). The three other SNPs (rs35308602, rs2074381, and 
rs3828539), which were in high LD (r 2 >0.6) with the top 10 
SNPs, were predicted to be benign by PolyPhen-2 and the 
BLOSUM scores were '2 7 , T ', and £ -V, respectively (see online 
supplementary table S7). In case of the CN loss, even though it 
was not more significant than the associated SNP allele, the 
synergetic effect of this variant with rs4 148254 was suggested 
in the founder analysis. 

The protein encoded by UGT8 is UDP glycosyltransf erase 8, 
which is highly expressed in brain (see online supplementary 
figure S2). It is the first enzyme involved in complex lipid bio- 
synthesis in the myelinating oligodendrocyte 43 and clearance of 
long-chain ceramides (lcCer). lcCer clearance in neurons is 
mediated by glucosylceramide synthase (GCS) and studies have 
shown that decreased GCS leads to abnormally high lcCer. 44 
A significant early downregulation in glial GCS expression was 
associated with an increase in UGT8 mRNA in Alzheimer's 
disease, 45 and some patients with Alzheimer's disease have 
been observed to preserve musical ability long after losing all 
other cognitive functions. 6 

Although this study primarily focused on UGT8, there are 
other genes such as UNC5C, ALPK1, and ELOVL6 equally 
worth our attention. The protein encoded by UNC5C plays a 
role in the chemorepulsive effect of netrin-1 in axon guidance. 
This gene was previously suggested as a susceptibility gene for 
musical ability in the Finnish linkage study. 11 Regarding the 
other two, one study has shown that mice homozygous for 
disrupted copies of Alpkl exhibited coordination defects, 46 and 
ELOVL6 was once reported as one of the susceptibility loci for 
attention-deficit/hyperactivity disorder in a genome-wide asso- 
ciation study. 47 Several previous findings, as listed above, have 
supported the neural involvement of those candidate genes; 
however, more evidence should be given to associate them with 
musical ability. 

Music is a complex cognitive skill in the neuronal network 
affected by several potential covariates. We first considered 
language ability as a potential covariate besides age and sex. 
However, we found no language skill defects in our study sub- 
jects, and previous studies have reported that it is possible for 
language skills to be impaired while musical abilities are 
spared (aphasia without amusia); likewise, musical abilities 
can be impaired while language skills are spared (amusia 
without aphasia). 6 48 In addition, more factors including 
special musical training, education status, and education dur- 
ation might be considered as potential covariates, since it has 
been reported that the skill of absolute pitch could be devel- 
oped at a very young age by special musical training 49 50 
However, our participants lived in an isolated area with a 
homogeneous culture, and most of them were educated in the 
same public school without any additional musical training. 
In this study, therefore, we did not take those factors into 
account for analyses. 

In summary, we have demonstrated for the first time that 
common genetic variants in UGT8 are associated with musical 
ability, exemplifying a methodology to assign functional signifi- 
cance to the results of various association studies, which in 
many cases yield synonymous or non-coding alleles. 
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